Distributed representations of prediction error signals across the cortical hierarchy are synergistic

A relevant question concerning inter-areal communication in the cortex is whether these interactions are synergistic. Synergy refers to the complementary effect of multiple brain signals conveying more information than the sum of each isolated signal. Redundancy, on the other hand, refers to the common information shared between brain signals. Here, we dissociated cortical interactions encoding complementary information (synergy) from those sharing common information (redundancy) during prediction error (PE) processing. We analyzed auditory and frontal electrocorticography (ECoG) signals in five common awake marmosets performing two distinct auditory oddball tasks and investigated to what extent event-related potentials (ERP) and broadband (BB) dynamics encoded synergistic and redundant information about PE processing. The information conveyed by ERPs and BB signals was synergistic even at lower stages of the hierarchy in the auditory cortex and between auditory and frontal regions. Using a brain-constrained neural network, we simulated the synergy and redundancy observed in the experimental results and demonstrated that the emergence of synergy between auditory and frontal regions requires the presence of strong, long-distance, feedback, and feedforward connections. These results indicate that distributed representations of PE signals across the cortical hierarchy can be highly synergistic.

In this excifing paper, Gelens et.al. use two auditory protocols in non-human primates with ECOG to study how predicfion error is processed across brain regions.To that end, they computed co-informafion (a method that disfinguishes redundant from synergic informafion processing).This paper is fairly wriften; the authors present an impressive number of results that are, at certain points are difficult to integrate for the reader.Overall, I am impressed with the work done by the authors.This work will certainly consfitute an important addifion to the field.However before recommending the publicafion, there are some aspects that the authors need to clarify to validate their results and the conclusions enfirely.
1) My main concern is that in all of the presented analyses, there is a bias towards synergic interacfions that the processing of the sfimulus itself might not drive it, but the interacfion with unrelated ongoing acfivity.This can be directly observed in the interacfions between the baseline period (-100 to 0 ms) and the post-sfimulus period (0 to 300 ms) in all cases (experimental data -Figures 3, 4, 5, 6 -and model -Figures 7 and 8-) there is a synergic interacfion (blue).In my interpretafion, this synergic interacfion between sfimulus-related and non-related acfivity is a bias that probably affects the rest of the analysis and needs to be corrected.A potenfial way to do it is to do a baseline correcfion of the co-informafion computafion and remove the mean value of these periods from the rest of the analysis.This will probably reduce the quanfified synergy (and augment the redundancy).The authors must quanfify if the results (and the conclusions) change after this control.
2) The authors present a computafional model to explore potenfial explanafions for the results.The authors argue that comparing a computafional model including feedforward and feedback connecfions and one with only feedforward alone demonstrates that feedback connecfions are responsible for the emergence of synergic processing.I'm afraid I have to disagree with the authors that the models themselves can demonstrate any neuronal mechanism.Models are helpful to interpret data and generate predicfions that can be subsequenfially experimentally tested.The results obtained with the model are not evidence of how the mechanism is implemented in the brain.In addifion, the results presented by authors comparing the two types of models (Figures 7 and 8) are suggesfive of a reducfion of synergy.Sfill, there's no quanfificafion of this result (overall, the simulafions coming out of the two models look quite similar -panels B/D and C/F in figure 7 and panels B/D in figure 8 -).

Reviewer #1 (Remarks to the Author):
The manuscript addresses the question of information processing in distributed neural networks and specifically, whether local activities in distinct brain regions are redundant or synergistic.The notion of synergy is defined here as evidence of a form of information sharing (called here co-information) between the neural activity patterns captured at distinct recording sites.These ideas are tested using a remarkable set of widespread electrocorticographic (ECoG) recordings in 5 common marmosets, 3 and 2 of them subjected to two different auditory tasks known to elicit prediction error signals.The main message of the report is that prediction error signals are encoded across multiple brain sites in a synergistic manner, in addition to well-documented regional manifestations in e.g., auditory regions.

Response:
We thank the reviewer for their initial evaluation and constructive criticism, and we appreciate the opportunity to improve our methodological approach and the overall quality of the manuscript.We have added new analyses and figures, as well as discussion, to address the reviewer's concerns point by point: Although the scientific question is significant and the nature of the data is outstanding, I find the premises of the study and the approaches rather unclear, and the actual effects in the data unconvincing.
A major issue with the interpretation of redundant vs. synergistic effects between pairs of recording sites is with the lack of control for possible trivially redundant information contributed by a third site or more.Indeed, I understand that synergistic vs. redundant effects are ascertained on the basis of pairwise measures of co-information between two recording sites.These pairwise measures rest on the unstated hypothesis that network interactions in the marmoset brain take place between two brain nodes, irrespective of the activity taking place at a third location, or more.This issue of partial measures of interactions, is well known in network neuroscience but seems to have been overlooked by the authors unfortunately.
Response: Thanks for the comment.We would like to emphasise however that typical network studies relate the activity in different areas.Such activity-based analyses typically consider a correlation (or mutual information) between the time-varying activity of two regions.This correlation might reflect a genuine connection, but could also reflect a third region that independently connects to both regions.Conditional measures (e.g.partial correlation, conditional mutual information) can address this (with some caveats -for example, conditioning out does not actually fully remove the contribution of the conditioned node, because conditioned measures include synergistic effects, e.g.Williams and Beer, 2010).However, applying these sorts of measures can be challenging as one must condition out all other recorded nodes.In any case, there may always be other nodes, which for whatever practical reasons were not accessed with the experimental setup (see Mehler & Kording, 2018).
While we acknowledge the limitation that our study is based on pairwise representational interactions, we believe the focus on representational interactions of experimentally controlled prediction errors means this is a novel and valuable contribution.We would also like to emphasise that our primary scientific goal concerns two inherently pairwise questions.First, relating the representation of stimuli between different time points (c.f.The temporal generalization cross-decoding method, see King & Dehaene, 2014), and second, relating activity between auditory sensory processing regions, and frontal regions often implicated in control or higher level processing.These are pairwise question and so we argue pairwise measures of representational interaction are sufficient and useful to approach them.To address the reviewer's comments about the exclusion of certain sites in the channel-wise analysis, we have extended the manuscript with a new methodology that uses a multivariate pattern analysis of each region.We still perform a pairwise comparison of the representation between frontal and temporal regions, but now we can include all channels within each region.We detail this below.
In order to directly address the reviewer's concern, we have developed an entirely new approach for computing co-information based on the response across multiple electrodes (coined as MVCo-I Method).In particular, we have applied a multivariate analysis approach that uses machine learning to capture the best linear representation of the prediction error signal across a whole region, and we have repeated our co-information analyses within and between the two brain regions of interest using the classifiers' outputs (frontal and temporal).As the reviewer might notice in the next series of new control results, the patterns of synergy and redundancy observed both within and between regions for both tasks replicate the results observed using the electrode pair-wise comparisons.The description of the MVCo-I method, as well all as the changes made in the manuscript regarding its usage as a control analysis for potential informative spatial patterns of redundancy and synergy that might be missed from our original electrode pairwise co-I analyses.
Lines 254-283: "Although the per-electrode and electrode-pair analyses of synergy and redundancy exploit the optimal spatial resolution of the recording modality across temporal and frontal regions, they could also miss information encoded in the spatial pattern both within and between temporal and frontal areas.They could therefore potentially miss synergy or redundancy that is only apparent when considering multiple electrodes together, either due to low signal to noise ratio within each channel, or because of a genuinely distributed informative spatial pattern.This might be particularly relevant for the ERP signals that showed extensive temporal and frontal PE effects (Figure 1A,B).Thus, to account for potential informative spatial patterns of redundancy and synergy in ERP responses, and to reduce any concern about high-order interactions between channels within each region in the pairwise channel analysis, we complemented our analyses by computing co-information based on the response across multiple electrodes (MVCo-I: Multi-Variate Co-information) (Figure S11 and S12).In brief, we have applied a cross-validated multivariate analysis approach that uses machine learning to capture the best linear representation of the prediction error signal across a whole region, and we have repeated our co-information analyses within and between the two brain regions of interest using the classifiers' outputs (frontal and temporal) (see Methods).
The MVCo-I analyses within-region (Figures S13 and S15) and between-regions (Figures S14 and S16) showed comparable co-I in terms of synergistic and redundant dynamics observed in the per-electrode (Figures 3,5) and in the between-electrodes (Figures 4,6) analyses, but with increased statistical power (i.e., increased MI)." Lines 993-1078:

"Multivariate Co-Information Method (MVCo-I):
Mutual information quantifies statistical dependence on the meaningful effect size of bits.Crucially, these values are additive when combining independent representations.This allows us to quantify representational interactions between electrodes as synergistic or redundant using co-I as described in the manuscript.However, estimating mutual information on high-dimensional responses is challenging.Multi-Variate Pattern Analysis (MVPA) is an approach that has been widely adopted in neuroimaging and neuroscience to deal with high-dimensional signals (King and Dehaene, 2014).MVPA uses techniques from the field of machine learning: namely, supervised learning algorithms and cross-validation, to learn informative patterns in high-dimensional data, and evaluate their generalisation performance (i.e.how well the model could predict in new data).Here we use linear-discriminant analysis to learn the most informative linear combination of channel activity in each region to predict the binary class of the stimulus (i.e.deviant vs standard).There are various metrics for evaluating the cross-validated predictive performance of classification algorithms, for example, overall accuracy, or measures like Area under the ROC curve (Poldrack et al., 2020).These metrics can be used to rank models based on different features (i.e.compare the amount of information in temporal vs frontal regions).Here, we combine MVPA with information-theoretic co-information to quantify the representational interactions in predictions made from cross-validated multivariate models.To do this, we first apply MVPA in the typical way (here using MVPALight toolbox; Treder, 2020) using 10-fold cross-validation (CV).In a 10-fold CV, the overall dataset is randomly separated into 10 disjoint subsets.Then, a model is fit on 9 of those subsets and tested on the 10th, and this is repeated for each of the 10 subsets.Here, we take the decision value of the learned classifier (the value of the linear combination of the weights and the data, which would then be thresholded to make the classification) for each test set trial.This quantifies how strongly the informative pattern the classifier had learned was present in the data on that trial.We combine the test-set decision values from all 10 different CV repetitions and calculate the mutual information between these out-of-sample decision values and the true stimulus value on each trial (Yan et al., 2023;Yan et al;2023b).We have used MVPA to reduce the activity of the multi-channel region into a single scalar value: the decision value.
We can repeat the MVPA analysis for each time point of the stimulus-locked epochs.Often, temporal cross-decoding (King and Dehaene, 2014) is employed with MVPA to compare the consistency of the informative patterns over time.For this method, a classifier is trained at time t, and then tested (in the hold-out test folds) at other times.If it can decode, it shows the same pattern that was learned at time t, which is information at other times.However, this can only compare between data sets or conditions that are in the same physical space: i.e. we can cross-decode across time within one brain region, but we cannot compare between two different brain regions, because there is no way to apply the linear weights learned in the frontal region to the completely different temporal electrodes.Combining MVPA with co-Information (MVCo-I) overcomes this limitation.We compute co-information between the cross-validated decision values of different classifiers.This admits the same interpretation as for the channel-wise analysis.Redundancy shows that there is common information accessed by the two decoding models.Synergy means that there is a super-additive boost in the information available when considering the pattern activation from both models together.When estimating the joint information for the co-information calculation we take the maximum of the individual region MI's (because the data processing inequality tells us this is a lower bound on the information that can be extracted from the joint response), the MI from the combined dvals (2D signal; this has the advantage of being low dimensional response for MI calculation, and being the optimal informative signal from each region) and the MI from a joint MVPA model fit to the combination of channels from both regions (1D d-vals, but which has the possibility to include synergistic information between the regions which we want to capture with this measure).
We apply this methodology here in two ways.First, we look at within-area MVCo-I.For this, we train CV classifier models separately at each time point.We then calculate the co-information between two-time points using the cross-validated decision values of the two models.Note, that a crucial difference between this and the temporal cross-decoding method is that we always use the model that is learned to optimally decode information at that time point.Cross-decoding can tell if the same pattern is informative, but we can see redundant information even when the informative pattern changes.We can then compute MVCO-I between regions in the same way." In this method, the classifier can extract all the (linearly available) information in each region.We hope this addresses some of the reviewer's concerns about redundant third channels, at least in the regions we consider.

Figure S11. Schematic of the MVCo-I method.
To calculate co-information between multivariate responses we use 10-fold cross-validation (CV) with Linear Discriminant Analysis (LDA).For each of the 10 hold-out folds, we train an LDA classifier to discriminate the stimulus category of a trial (i.e.deviant vs standard tone).We then compute the classifier decision values (d-vals; i.e. the linear combination of the learned pattern weights and the raw data) on each trial for the hold-out fold.We then concatenate the CV d-vals from all folds.We have used LDA as a cross-validated supervised dimensionality reduction method to obtain a one-dimensional representation of the region's activity at that time point, which is maximally discriminative between conditions.We can then compute the co-information between the ground-truth stimulus class of each trial, and the CV d-vals from two different classifiers (i.e.either from the same region at different time points, within-area coI, or from different regions, between-areas co-I).This calculation is the same as for channel-wise analysis (see Methods).It is a significant issue because effects presently interpreted as synergistic between two recording sites, may actually be caused by the activity at a third site or more, and be redundant, not synergistic with the activity these sites.

Response:
We don't agree with this assessment.A major feature of our study is that we do not relate neural activity between two nodes (as in classical network analyses as mentioned above), but instead, we anchor our measures on the experimental stimulus manipulation -i.e.we look at and relate the information different regions carry about the stimulus.Our synergistic interactions are not between the unconstrained resting-state activity of 3 brain regions, but are about an interaction in the representation of an experimentally controlled stimulus contrast (here the oddball PE manipulation) between two regions.This experimental control puts us in the regime of "randomised control trial", the gold standard for determining causal relations in experimental science, as opposed to the "observational study" analytic regime which is typical for activity-based resting-state network neuroscience.
Given this fundamental difference, we don't agree that the restriction to pairwise representational interaction is such a limitation.By definition, the existence of synergistic information about the oddball means there is information that is not available from either region individually but is available from the two together.Either region could be correlated to any degree with any number of third regions but that wouldn't change the fact about the synergy -that the relationship in the activity between these two regions conveys information that is not available from either one alone.In our view, this is a novel and useful statement about distributed neural activity during prediction error.For example, there could be a third site, which has a representation that is redundant with the information that is represented synergistically between the two sites -but this does not affect the synergistic relationship between the two sites.There could be a third site, which conveys information redundantly with either or both of the considered sites, but again, this would change the fact that there is synergy between those two.There could be a third site with activity unrelated to the stimulus, or indeed any other source of common noise or global state to the two regions that could also result in synergy (because observing one gives an estimate of the common noise that can improve the prediction of the stimulus based on the second).But this is just a fact of these sorts of measures, observing information theoretic synergy is a statistical result of the system, not a mechanistic one.So we do not make any resulting claims about connectivity between areas that are the goal of typical resting-state functional connectivity where partially out other areas is of course crucial for such interpretations.
Therefore, the measures of co-information between two sites need to be assessed conditionally to the activity at the other recording sites, just like when partial correlations are derived from fMRI or other time series to establish causal connections between brain regions.

Response:
As above, we argue there is a fundamental difference in the experimental goals, the interpretation of the results, and the methodology used in our study compared to typical resting-state, activity-based functional correlations.
Although controlling partial redundancy may not be crucial in most network studies, it is crucial here because the entire purpose of the study is to establish whether pairwise brain site interactions are redundant or not.To do this properly, it is essential to control for possible redundant interactions from regions outside the tested pair of brain sites.

Response:
We don't agree with this point.Our scientific question is fundamentally pairwise -we wish to relate prediction error between frontal and temporal regions to gain insight into the hierarchical implementation of this important process.Because our representational interactions are anchored on the experimentally controlled stimulus, we do not see how any putative third region affects our pairwise interpretation.If there was a third subcortical region we could not access which was redundant with activity in both frontal and temporal -this does not affect our conclusions about the redundancy between this pair of regions in any way.It is still the case that the prediction error decoded from frontal regions is redundant with the prediction error decoded from temporal regions.Observing redundancy tells us that there is the same trial-by-trial predictive content in both areas.If this is the case, whether that same information is available in other parts of the brain is an orthogonal question.In this way, the shift from activity correlation to information-theoretic representational interaction changes the interpretation and reduces the concern about third regions.
Nevertheless, as shown at the beginning of the response, to address this concern we have applied a multivariate analysis approach that uses machine learning to capture the best linear representation of the prediction error signal across a whole region, and we repeat our analysis between the two brain regions of interest (frontal and temporal).
Another, more pragmatic issue with the paper if the lack of clarity in the presentation of the core results, and essentially the main body of results from Figure 3 onwards.A large number of temporal co-information diagrams are stacked in these figures with relatively poor guidance for visualization (some colorbars are missing) and with statistically significant effects of an uncertain spatio-temporal structure.
Response: Thanks for the comment.We have now improved the clarity of the core results in the revised version of the manuscript (highlighted in yellow), as well as the interpretation of the synergistic and redundant effects in the Discussion.The main changes are the following: Line 186-206: "The dynamics of spatio-temporal synergy in the ERP and BB signals showed complex and heterogenous patterns between early time points of the auditory electrodes and later time points in the frontal electrodes (Figure 4).For example, while the ERP signals encoded both diagonal (Figure 4A; grey clusters ~100-350 ms after tone presentation) and off-diagonal synergistic patterns (Figure 4A; grey clusters ~150-350 ms after tone presentation), the BB signals mainly showed off-diagonal synergy between temporal and frontal electrodes (Figure 4B; grey clusters ~220-350 ms after tone presentation).In Figure 4A, the diagonal stripes suggest the possibility of oscillatory dynamics, where the representation in frontal regions between 50-300 ms is enhanced by knowledge of the activity of temporal regions ~50 ms earlier (the upper diagonal line).Note that 50 ms peak-to-peak timescale corresponds to a frequency of ~10 Hz, i.e. the alpha range.In Figure 4B the off-diagonal block suggests that the frontal representation of the stimulus between 20-120 ms initiates a state change: later temporal activity (200 ms+) enhances the readout of the stimulus class, even though there is no representation of PE in the BB signal of the temporal area at that time."5B; grey clusters ~150-350 ms after tone presentation).The ERP signals, on the other hand, showed diagonal synergy in both the temporal (Figure 5A; grey clusters ~40-150 ms after tone presentation) and frontal cortex (Figure 5C; grey clusters ~150-350 ms after tone presentation)."Line 230-235: "In the case of the Global contrast, we observed temporal synergy across early and late time points but mostly in the BB signals both within the auditory (Figure 5F; grey clusters ~0-350 ms after tone presentation) and frontal electrodes (Figure 5H; grey clusters ~230-330 ms after tone presentation)."Lines 435-445: "The off-diagonal synergy between early and late time points could be a signature of a neural state shift.It is interesting to note that the synergy remains strong over periods after the PE response is no longer represented (i.e.no MI at those time points).However, the initial representation of the PE may have shifted the local network dynamics into a different state.Then knowing this ongoing state improves the readout of the encoded information at the earlier time point.Thus, the off-diagonal synergy might be an echo of the initial PE representation that is not directly observable in later time points.

Line 216-224: "In the Local contrast, although we observed temporal synergy in both ERP and BB signals, the off-diagonal synergy was primarily observed between early and late time points of the BB signals in the temporal cortex (Figure
Lines 446-457: Synergy can also arise from a common source of neural noise that is non-stimulus specific.For example, the spatio-temporal synergy between regions could reflect a global change in attention or arousal.In this situation, the readout of one area provides information about the global neural state even when it doesn't convey information about the PE directly, and this can be used to improve the resolution with which the PE can be decoded from the other area.Although this might be a possibility, the tight timing of the synergy bands observed in both experiments (i.e, diagonal and off-diagonal synergistic patterns) speak more of a transient dynamics rather than global ongoing fluctuations underlying the spatio-temporal synergy." The data from the computational model of brain networks is also only casually (qualitatively) compared to the empirical electrophysiological data, and related statements such as "[the model] accurately replicates critical neurobiological and neuroanatomical features of the mammalian cortex", "we immediately observed..." etc. are clearly exaggerated.
Our approach here, which builds upon a number of previous neurocomputational modelling works (e.g.Garagnani et al, 2008;Garagnani and Pulvermuller, 201;Garagnani & Pulvermuller 2013;Pulvermuller & Garagnani, 2014;Garagnani & Pulvermuller, 2016;Tomasello et al., 2017;Henningsen-Schomers et al. 2023;Shtyrov et al., 2023), is to take into account as many biological constraints considered essential for implementing neurocomputational models of cognitive and brain function (see Pulvermuller et al. 2021) as practically viable given the available computational resources.While models implementing higher levels of biological realism can always be developed, one should also question whether such more refined models would be able to provide a better understanding of the phenomenon of interest.We should also add that we are not aware of any existing neurocomputational model of the relevant marmoset brain areas that can accurately replicate the observed ECoG responses while implementing all of the above neurobiological constraints; hence, we submit that the present effort is a significant step forward in this direction.
In regard to the point concerning the qualitative comparison between model and ECoG results, we fully agree with the Reviewer: simulation and experimental data should be quantitatively compared; this can be done by means of the respective significance plots reported on the right-hand side of each of the co-information, redundancy, and synergy plots (see the grey-level panels in the figures).Following the Reviewer's suggestion, we now report the Structural Similarity Index (SSIM) values obtained by comparing the co-information charts from the simulations against those obtained from experimental data.The SSIM assesses the structural similarity between two images (see https://uk.mathworks.com/help/images/ref/ssim.html), with values ranging from 0 (dissimilar) to 1 (highly similar).Hence, we converted the co-information plots to images, and computed SSIM between them: SSIM scores between simulated and real co-information charts were 0.74 for the temporal cortex (Fig. 7B versus 3B) and 0.83 for the frontal cortex (Fig. 7C vs. Fig.3D); both of these similarity indexes are significantly above chance level (p<0.05),obtained by re-computing the SSIM index for 1000 randomly shuffled versions of the images corresponding to the simulated data.We have now added this result to the main text.We thank the Reviewer for this valuable suggestion.
Lines 336-350: "To quantify the similarity of the co-I values between the real and the simulated data, we computed the Structural Similarity Index (SSIM).The SSIM assesses the structural similarity between two images, with values ranging from 0 (dissimilar) to 1 (highly similar).Hence, we converted the co-I plots of the real and simulated data to images and computed SSIM between them.While the SSIM between simulated and experimental co-I was 0.74 (Figure 7B versus Figure 3B), the frontal cortex comparison showed an SSIM of 0.83 between simulated and experimental co-I (Figure 7C versus Figure 3D).Both values were significantly above chance level (p < 0.05) after comparing them to a distribution of surrogate SSIM values.The surrogate distribution was obtained by computing the SSIM between the experimental co-I image and a shuffled version of the simulated co-I image, and repeating this procedure 1000 times." Last, we acknowledge that the statement "We immediately observed…" could be easily misinterpreted, and apologise for this lack of clarity.What we meant to convey here was that the model taken from the (Garagnani & Pulvermüller 2011) study was applied "as is" to the present project, and that, prior to any adjustments/parameter tuning, the network responses already exhibited the presence of synergy; the subsequent process of fine-tuning only improved the fit of the model results with the experimental data from the marmoset, but did not qualitatively change them.We have removed "immediately" and rephrased the statement in question to clarify this.The revised paragraph now reads: Lines 323-335: "We observed that, before any adjustment of its parameter values, the network already encoded both redundant and synergistic information, specifically, in the signal from its superior-temporal region (including areas A1, AB, PB).We then further constrained the model's dynamics by fine-tuning three of its parameters [...].This process of parameter tuning [..] did not qualitatively change the network's responses, but simply improved the fit of the responses with the observed data".
Once again, thank you for identifying this potentially misleading/unclear statement.
This paper analyzes synergy between different cortical regions and time delays for conveying prediction error signals.The authors report that interactions are primarily synergistic.The analysis is based on event-related potentials (ERP) and broad band (BB) signals.Overall, the results have the potential to provide a useful contribution to the field.However, the observed size of synergy/redundancy is very small, on the order of ~0.01.

Response:
We thank the reviewer for their comments, which helped us to improve the manuscript.In the response, we have now performed some simulations, as well as added some discussion to address the reviewer's points.
We appreciate the reviewer's point regarding the effect size.While 0.01 might seem a low number on an absolute scale (i.e. if a reader is more familiar with effect sizes like d'), we don't think that is true in this study.A key advantage of information theory is that it allows a range of quantitative statistical assessments on the same effect size scale of bits.This is the core unit of information -one bit corresponds to a halving of the uncertainty, or alternatively to one yes-no question which splits the space of possibilities into two halves.It's also important to keep in mind that our measure of information in bits, is the average reduction in uncertainty expected on a single trial, from a single sample (MI is calculated per sample).Here the sampling rate is 500 Hz, so 0.01 bits/sample corresponds to around 5 bits/second, which is consistent with some of the highest information-bearing neural signals.(Zhan, 2019).In these diverse applications, it is interesting that consistent, robust within-participant effects often fall in the range of 0.01-0.05bits/sample, similar to what we report here.So we a priori don't agree that these values are very small, although we accept the units may be unfamiliar.We have now added to the Methods the following description: Line 798-804: "Note that MI and co-I values are reported in units of bits.A value of 1 bit corresponds to a halving of uncertainty of the trial state when observing the neural response.It is important to keep in mind though that these information values are the average per sample.Here we use a sampling rate of 500Hz, so a value of 0.01 bits/sample corresponds to an approximate information rate of 5 bits/second." correction exists (it is an analytic correction for the sampling bias estimating the determinant of a covariance matrix from empirical data) (Misra et al., 2005;Goodman, 1963).This is applied for all calculations, so we do report bias-corrected values.
However, we don't agree that correcting for bias is necessary for reliable inference, or that doing so would affect in any way our non-parametric permutation test results.Bias refers to the estimation error that results when mutual information is estimated from a finite data set.Bias refers to a systematic offset in the the long run expected value of the estimand if the experiment was repeated thousands of times.Non-parametric permutation inference works by shuffling away the relationship of interest in the data, to obtain values from a surrogate null distribution.Because this shuffled data has all other properties inherited from the original data set (dimensionality, number of samples, outliers, etc.) the bias of the information estimates obtained from the permutations should be similar to that of the true calculation.However, it is the variance over permutations that is more important for inference.The measured MI values are then compared to this surrogate permutation distribution.For this procedure, bias correction makes no difference: as it is a constant offset (GCMI bias is a function of dimensionality and number of trials) it shifts both surrogate null values and the true value equivalently, so does not affect the percentile of the surrogate null distribution at which the measured value sits.The following simulation illustrates this: Examples of permutation inference for a weak (left, mean difference between classes 0.1) and strong (right, mean difference between classes 0.2) effect with (upper panels) and without (lower panels) bias correction.The red line shows the value of the simulated data, blue bars show the results of 1000 permutation shuffles.The percentile of the full value with respect to the permutation distribution is shown in the title.Bias correction shifts both the surrogate null permutation distribution and the measured values, but does not change their relationship.Non-parametric permutation testing and the method of maximum statistics provide a robust approach to inference that accounts for the sample size of the data, as well as any outliers, autocorrelation between samples, etc. and is not affected by bias correction of the mutual information measure.
The code is available here: https://gist.github.com/robince/de88832a931808f38d137bc1903c81c4 Please note that the error-bars were not described in figure legends and it is not clear what the shaded regions represent, such as whether these are standard errors of the mean, standard deviation, or 95% confidence intervals.
In fact, the results plotted in Figure 7 (left) are intended to show the fit between the model and experimental data, achieved after fine-tuning the network parameters (i.e., the strength of the neuronal adaptation, the local inhibition, and between-area links, as mentioned in the Results section of the revised version highlighted in yellow).In particular, as explained in the figure's caption, the data from the "Fully-Connected" (FC) model architecture (Figure 7B-C) closely resemble those obtained from the analysis of the Broadband ECoG signals (Figure 3B-D).
Having achieved a good fit between the model and experimental data, we then used the model to make novel predictions.Specifically, we manipulated the network's connectivity structure to investigate the question of how synergistic information might emerge in it.As explained in the revised version, we ran additional simulations using a "Feed-Forward only" (FF) architecture, in which all of the between-area feedback and within-area recurrent links were removed.This was aimed at establishing whether these model links are critical for the emergence of synergistic information within the (simulated) temporal and frontal systems.The FF architecture is shown in Figure 7D (and Figure 8C,too).As the Reviewer correctly noted, the data in Figure 7 (B-C vs. E-F) reveal no major differences across the two architectures.A striking difference between the responses of the FC and FF models, however, does emerge if one analyses the synergistic interactions between the (simulated) frontal and temporal model regions (Figure 8).Specifically, the synergy data in panels B and D of Fig. 8 (see the plots at the bottom-right corners of the two panels) show that, while synergy levels are significant in the FC model (panel B, bottom-right plot), in the FF model they are not (panel D, bottom-right plot).Also, note that this is a quantitative result: panel D shows that there are zero data points with significant synergy, whereas panel B contains significant synergy levels.Additional simulations obtained with a further version of the architecture implementing only next-neighbour between-area links along with recurrent ones (see Figure S10A) produced no synergistic temporal-frontal interactions (Figure S10D), even after further parameter tuning.
The above results, taken together, indicate that a 'feedback' flow of information from frontal to temporal systems (still enabled in the Figure S10A model through the reciprocal PB-PF connection) and information processing via recurrent (within-area) links -also present in the model of Figure S10A -on its own is not sufficient to generate significant synergy between the temporal and frontal (model) systems.Instead, it appears that it is the bi-directional higher-order, or "jumping" (Schomers et al., 2017) links reciprocally connecting non-adjacent areas of the FC network (Figure 7A) that are required for significant across-systems synergy to emerge.As we now state in the revised Discussion (sub-section "Interpreting synergistic interactions"), on the basis of this computational result we conjecture that the cortical homologues of such jumping links (known to exist between corresponding regions of the marmoset brain) may play a similarly crucial role in the emergence of the temporal-frontal synergistic interactions observed in the ECoG data.This prediction awaits further experimental validation.

REVIEWERS' COMMENTS
Reviewer #3 (Remarks to the Author): I appreciate the changes introduced by the authors.I have no further quesfions.
Reviewer #4 (Remarks to the Author): I have received this manuscript on synergisfic vs redundant inter areal communicafion as a reviewer post its first round of reviews to assess if the authors had taken the inifial reviewers concerns adequately into considerafion.After reading the revised manuscript, as well as the correspondence to the previous reviewers I have to assess that the reviewers concerns are mostly not taken into considerafion, nor properly adressed.The authors disagree largely with very valid crificism on both the conceptual and methodological aspects of the manuscript and largely add descripfions to the manuscript that do not help the caveats menfioned by the reviewers.Overall the lack of isolafion and causafive inferences make it very hard to interpret the results.The reviewer would like to emphasize that this is true for many studies in this domain but the claims of the paper are a delineafion that rests on assumpfions not tested or verified.The authors respond to part of that crificism by demonstrafing robustness of their results which is laudable but ulfimately wrong assumpfions can be robust nonetheless.Lastly I would like to point at the very small effects that make this reviewer caufious how solid these results are and if they would actually replicate.

Figure S13 .
Figure S13.Temporal synergy and redundancy within ERP signals in the auditory and frontal electrodes using the MVCo-I Method (Experiment 1: Roving Oddball Task).MVCo-I revealed synergistic and redundant temporal patterns within Temporal ERP (Panel A) and Frontal ERP (Panel B) signals in the auditory cortex.MI (solid traces) between standard and deviant trials for auditory (pink color) and frontal (orange color) responses averaged across the three monkeys.The corresponding electrodes used for the MVCo-I method are depicted in Figure 1B.Error bars represent standard error of the mean (S.E.M).Temporal co-I was computed within the corresponding signal (ERP) across time points between -100 to 350 ms after tone presentation.The average of the corresponding electrodes across monkeys is shown for the complete co-I chart (red and blue plots); for positive co-I values (redundancy only; red panel); and for negative co-I values (synergy only; blue plot).

Figure S14 .
Figure S14.Spatio-temporal synergy and redundancy between auditory and frontal ERP signals using the MVCo-I Method (Experiment 1: Roving Oddball Task).(A) MVCo-I revealed synergistic and redundant temporal patterns between temporal and frontal signals.MI (solid traces) between standard and deviant trials for auditory (pink color) and frontal (orange color) responses averaged across the three monkeys.The corresponding electrodes used for the MVCo-I method are depicted in Figure 1B.Error bars represent standard error of the mean (S.E.M).Temporal co-I was computed within the corresponding signal (ERP) across time points between -100 to 350 ms after tone presentation.The average of the corresponding electrodes across monkeys is shown for the complete co-I chart (red and blue plots); for positive co-I values (redundancy only; red panel); and for negative co-I values (synergy only; blue plot).

Figure S15 .
Figure S15.Temporal synergy and redundancy within ERP signals in the auditory and frontal electrodes using the MVCo-I Method (Experiment 2: Local/Global Task).MVCo-I revealed synergistic and redundant temporal patterns within auditory ERP signals in the Local (Panel A) and Global (Panel B) contrasts; and within frontal ERP signals in the Local (Panel C) and Global (Panel D) contrasts.MI (solid traces) between standard and deviant trials for auditory (pink color) and frontal (orange color) responses averaged across the two monkeys.The corresponding electrodes used for the MVCo-I method are depicted in Figure 1B.Error bars represent standard error of the mean (S.E.M).Temporal co-I was computed within the corresponding signal (ERP) across time points between -100 to 350 ms after tone presentation.The average of the corresponding electrodes across monkeys is shown for the complete co-I chart (red and blue plots); for positive co-I values (redundancy only; red panel); and for negative co-I values (synergy only; blue plot).

Figure S16 .
Figure S16.Spatio-temporal synergy and redundancy between temporal and frontal ERP signals using the MVCo-I Method (Experiment 2: Local/Global Task).MVCo-I revealed synergistic and redundant temporal patterns between auditory and frontal ERP signals in the Local (Panel A) and Global (Panel B) contrasts.MI (solid traces) between standard and deviant trials for auditory (pink color) and frontal (orange color) responses averaged across the two monkeys.The corresponding electrodes used for the MVCo-I method are depicted in Figure 1B.Error bars represent standard error of the mean (S.E.M).Temporal co-I was computed within the corresponding signal (ERP) across time points between -100 to 350 ms after tone presentation.The average of the corresponding electrodes across monkeys is shown for the complete co-I chart (red and blue plots); for positive co-I values (redundancy only; red panel); and for negative co-I values (synergy only; blue plot).
Similarly, for comparison, we have used similar MI measures across a range of applications in neuroimaging (Ince et al 2017) including, face vs house contrast in EEG (arguably the strongest categorical ERP contrast) (Rousselet et al, 2014), speech entrainment in MEG (Daube et al. 2019), sampling of visual information with bubbles